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Abstract — In this article, we introduce the new approach 
fluorescence grid based aggregation (FGBA) to justify a dynam- 
ical model of protein expression using experimental fluorescence 
histograms. In this approach, first, we describe the dynamics of 
the gene-protein system by a chemical master equation (CME), 
while the protein production rates are unknown. Then, we 
aggregate the states of the CME into unknown group sizes. We 
show that these unknown values can be replaced by the data 
from the experimental fluorescence histograms. Consequently, 
final probability distributions correspond to the experimental 
fluorescence histograms. 



I. Introduction 

In the study of protein expression, flow cytometry is a 
promising technique for the analysis of protein regulatory 
system [11], [9], [13]. In a cell colony, flow cytometery 
measures single cell's fluorescent intensity, which repre- 
sents the protein concentration, and draws a fluorescence 
histogram. A fluorescence histogram of a cell colony is a 
plot of the cell count versus measured fluorescent intensity 
[17]. In theory, the process of protein expression has been 
stochastically analyzed to generate a probability distribution 
of protein concentration. However, there are two deficiencies 
to this analysis. First, generated probability distributions 
do not represent the experimental fluorescence histograms, 
since the relation between fluorescent intensity and protein 
concentration is unknown. Second, the protein production 
rate, which is a key parameter in stochastic analysis of 
expression, is not known for different expression states of 
a gene. 

In this paper, we study the expression of a protein called 
Ag43 by a gene named agn43. This protein is not involved 
in feedback regulation, and instead the encoding gene uses a 
mechanism of generating multiple phases in order to regulate 
the protein production. Phase variation describes changes in 
the expression state of the gene that results in mixed cell 
cultures in a colony [16]. A gene is called to have an On, 
Partial, or Off expression state, if it produces protein with 
a high, low, approximately zero rate, respectively. In the 
mechanism of agn43 regulation, between phases with On and 
Off expression states, the gene enters intermediate phases 
that act as buffers and prevent back and forth switching. 
Recently, Lim et al. (2007) proposed a dynamical model 
for the phase variation of agn43 and identified a third 
expression state, Partial, for the gene. They verified the 
model deterministically, and computed the phase variation 
rates of the gene. However, the protein production rates in 
those three expression states are unknown, and the dynamics 
of the protein production is not analyzed. 



As our main contribution, we introduce a new approach to 
justify the dynamical model of gene-protein system by the 
experimental fluorescence histograms. We call this approach 
the fluorescence grid based aggregation (FGBA). First, we 
compute the rate of increase in cell's fluorescent intensity 
by the steady state histograms. This rate has a linear relation 
with the protein production rate. Second, assuming that the 
stochastic dynamics of the gene-protein system is a Markov 
process, we describe this system by a chemical master 
equation (CME), while the protein production rates, for 
different expression states, are unknown. Third, we aggregate 
the states of the CME into groups with unknown sizes, and 
compute the dynamics of the aggregated system. Aggregation 
of Markov chains, also known as sparse grid approximation 
[5] and projection through interpolation [12], has been em- 
ployed to the gene regulatory networks in order to reduce 
the computation time. However, in those studies, the number 
of states being aggregated and the protein production rates 
were known, as opposed to our method. In FGBA method, 
we aggregate the CME based on the fluorescence grid sizes 
in experimental fluorescence histograms. By employing this 
method on the CME ([3]), we achieve the following goals: (1) 
we eliminate the dependence of the CME on protein number, 
and hence, its dependence on unknown protein production 
rates; (2) we define CME as a function of fluorescent 
intensity, solving which gives final probability distributions 
that correspond to the experimental fluorescence histograms; 
and (3) we reduce the size of the differential CME to reduce 
the computation time. Finally, we find an upper bound for the 
evolution of the error caused by employing FGBA method. 

The paper develops as follows. The remainder of this 
section reviews the studied gene and protein. The determin- 
istic and stochastic analysis of the gene-protein system is 
discussed in Sections [H] and III respectively. The FGBA 



method and its error are presented in Subsection III-A 



Numerical results are provided in Section IV Finally, some 
conclusions are drawn in Section [V] 



A. Gene-Protein System 

Antigen 43 (Ag43) is an outer membrane protein in the 
bacteria Escherichia coli and is described as its "most 
abundant phase varying outer membrane protein" [7]. This 
protein is encoded by a single gene called agn43 or flu. 
Flu is an abbreviation of fluffing due to the fact that the 
production of Ag43 causes interspecies cell aggregation by 
Ag43-Ag43 interaction. Hence, the expression of this protein 
enhances biofilm formation. As mentioned above, the phase 
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variation of agn43 regulates production of Ag43. In agn43, 
phase variation is performed by an epigenetic switch. An 
epigenetic switch can be defined as a heritable yet reversible 
switch in gene expression state, which is not mediated by 
a change in DNA sequence [16]. Therefore, agn43 is a 
controllable toggle switch. As a practical device, a toggle 
switch forms a synthetic, addressable cellular memory unit 
and has implications for biotechnology, biocomputing and 
gene therapy [4]. 

The dynamics of phase variation in agn43 is studied sepa- 
rately by [11] and [16]. A schematic of the model proposed 
by [11] is illustrated in Figure [T] The methylation state 
of three GATC sequences along the gene decides whether 
the expression is On (methylated) or Off (unmethylated). 
The methylation state of the GATC sites is determined 
by competitive binding between OxyR, a global oxidative 
stress protein, and DNA adenine methylase (Dam). Since 
there is no DNA demethylation reaction, gene replication 
is essential to the phase variation. After each replication: 
fully methylated agn43 (Mf), whose expression state is On, 
becomes hemimethylated (Mb); the hemimethylated agn43 
generates one hemimethylated and one unmethylated and 
naked agn43 (Un)', and the gene in the rest of phases keeps 
its initial phase. In Lim's model, the expression state of Afy 
is said to be either On or Partial, while we assume this 
expression to be On, according to the heritable expression 
state of agn43 [15]. OxyR can bind to Un and generate 
an unmethylated agn43 with OxyR (Uo)- In Lim's phase 
variation model, agn43 in Un and Uo phases partially 
transcribes protein, as opposed to the model proposed by 
Marjan et al. (2008). In Lim's model the DNA in Uo phase 
can undergo a conformational change, giving rise to an Off 
phase (O) with Off expression state. 

II. Deterministic Analysis 

The deterministic dynamics of the agn43-Ag43 system can 
be divided into three parts: gene's phase variation, protein 
production, and the gene replication during cell division. 

A. Dynamics of Phase Variation 

We briefly review the dynamics of agn43 phase variation 
in Lim's model. According to Section |I-A| five phases and 
three expression states are assigned to agn43: Mp, Mh, Un, 
Uo, and O phases with On, On, Partial, Partial, and Off 
expression states, respectively. As illustrated in Figure [T] the 
dynamics of these five phases can be written as: 

M F (t) = k M M H (t), M H {t) = k H U N (t) - k M M H {t), 
U N {t) = k_ U {t) - {k + k H )U N {t), 
U {t) = k- R 0{t) + k U N {t) - (fc_ + k R )U (t), 
0(t) = -k- R 0(t) + k R U o (t). 

According to the supplementary methods of [11], fc^/ = 4.3, 
jr 2 ^ = 3.7, and = 15.8. Based on our sensitivity 

analysis, we used fc# = 0.4. Here, we need two more 
equalities to compute all the phase varying rates. Owing 
to rare On-Off switching of the agn43 (7 x 10~ 3 cells per 
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Fig. 1. Lim's dynamical model of agn43-Ag43. The gene has five 
phases Mp, Mjj, Un, Uo, and O, with On, On, Partial, Partial, and 
Off expression states, respectively. Depending on the expression state, the 
protein x g is produced with three different rates f3 on , /3p ar tial. an d ftoff, but 
degrades with fixed rate 7. The arrows on the left represent the effect of 
replication on the phase of the gene. 



generation), kn -C ko, hence we assume that ko = 1000fc#. 
Finally, considering the steady state of the system, it can be 
computed that k R = 0.118fco 

B. The Dynamics of the Protein Production 

A useful function that describes protein production rate 
in many real genes is Hill function [2]. According to this 
function, in the absence of activator and repressor, the protein 
production rate is constant. As discussed in Section [LA] there 
is no feedback regulation in the production of Ag43 and the 
concentration of external factors, OxyR and Dam, during cell 
growth is constant by over expression. Thus, the dynamics 
of protein production can be described by 



Xg{t) = p-JX g (t), 



(1) 



where x g (t), (3, and 7 represent the concentration, production 
rate, and degradation rate of the reporter protein, respectively. 
In the experiments by [11], green fluorescent protein (GFP) 
is used as a reporter, and its production is regulated by agn43. 
GFP exhibits fluorescence in the cell, that can be measured 
by flow cytometry. Based on the method of generating and 
amplifying the expression of GFP in [11], we assume that 
there is a linear relation between the rates of Ag43 production 
and GFP production in the cell. However, their degradation 
is independent of each other, and the latter is measurable by 
flow cytometer. Therefore, we consider the dynamics of GFP 
production to verify the model by experimental results. 

The rate 7 is the sum of dilution and degradation rates. 
Dilution is the reduction of protein density due to increase 
in cell volume. Since a flow cytometer measures the total 
fluorescence of a cell rather than the density of fluorescence, 
the dilution rate is zero here. Degradation rate is computed by 
protein's half life r while its production rate is zero. That is, 
x b {t) = x g (0)/2 = x g (0)e-~< T , and thus 7 = ln2/r. Half 
life of wild type GFP is 26 hours [3], and one generation 
takes 85 minutes [11], therefore, 7 is equal to 0.0378 protein 
per generation. 

The protein production rate (3 depends on the expression 
state of the gene, On, Partial, or Off. Consider a gene that 
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Fig. 2. Three fluorescence histograms of cell colonies in three different 
expression states after 20 hours. This plot tells us that the steady state 
fluorescent intensity of a cell whose agn43 has On, Partial, or Off expres- 
sion state is 10 3 ' 5 , 10 1 ' 8 , or 10 a.u., respectively. Reprinted figure with 
permission from [11], ©2007, by Nature Publishing Group. 



remains in one expression state as time goes to infinity. 
Then, the protein concentration of the cell reaches a steady 

i x g (t) = 0. It follows from 



state Xg t00 , and thus lim^ 



equation ([TJ that (3 = -fXg ^, protein per generation. Our tool 
to compute x giOC is the experimental fluorescence histogram, 
e.g., Figure [2] However, for each expression state, such 
histogram gives us the fluorescent intensity of a cell in 
steady state in arbitrary units (a.u.) instead of the protein 
concentration. 

In a cell, the fluorescent intensity Xf depends linearly 
on protein (GFP) concentration, see [1] and [14]. That is, 
Xf(t) = (j,x g (t), where we call u, the fluorescence-GFP ratio, 
and its value unknown. Taking the derivative of both sides 
gives 

x f {t) =n{P- 1Xg {t)) =nP-~iXf(t) =0f—YX f (t), (2) 

where j3f denotes the rate of increase in fluorescent intensity 
of the cell. According to Figure [2] the steady state fluorescent 
intensity x / j00 of a cell whose agn43 has On, Partial, or Off 
expression state is 10 3 5 , 10 18 , or 10 a.u., respectively. It 
follows from /3 f = 7X/ :OC that /3/ )0n = 238, /3/, pa rtiai = 3, 
and /3/,off = 0.37 a.u. per generation. 

C. Replication Rates 

Replication of the cell has two effects in our model. First, 
we assume that the protein concentration of the cell becomes 
half of its initial value. This assumption is based on two 
reasons: "in immunofluorescence studies of Ag43-producing 
E. coli, the protein is seen evenly distributed over the surface 
of the entire cell" [6]; and, in our stochastic analysis we have 
observed that employing binomial distribution for protein 
concentration after replication has a negligible effect on the 
final probability distribution, see Figure [5] Second, after 
replication the gene's phase vary: any Mp gene becomes 
Mh\ half of Mh genes become Un, and the other half 
remain Mh ; and genes in the rest of phases keep their initial 
phase, see Figure [T] 

III. Stochastic Analysis 

We aim to describe the dynamics of the protein expression 
by the phase varying gene agn43 by a Markovian process. 
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Fig. 3. Each circle represents one possible configuration for a cell that 
contains agn43, based on the cell's protein concentration (horizontal axis) 
and its gene's phase (vertical axis). The transitions between configurations, 
shown by arrows, is possible through phase variation (red arrows) or change 
in protein concentration (black arrows). For brevity, the effect of cell 
replication on protein concentration is not illustrated. 



In other words, we compute the probability of a cell being 
in any configuration, which is here determined by its gene's 
phase plus its protein concentration, as a function of time. 
Therefore, a cell's configuration changes based on: (1) phase 
variation rates, (2) protein production and degradation rates, 
and (3) replication rates, see Figure [3] For each cell, the 
probability of having any such configuration is a function 
of time, and the union of those probabilities makes up the 
probability distribution vector P(t). More specifically, the 
first five entries of P(t) represent the probability of a cell 
having no protein and a gene with Mp, Mh, Un, Uo, and 
O phases, respectively; the second five entries represent the 
probability of the cell having one protein and a gene in 
mentioned phases; and so on. This probability vector evolves 
according to a continuous-time Markov process, which is 
called the chemical master equation (CME): 



P(t) = AP(t) + DP(t), 



(3) 



where the transition matrix A contains phase varying rates, 
and protein production and degradation rates, and D is the 
replication matrix. According to the system's deterministic 
dynamics, we compute the building blocks of the transition 
matrix, i.e., phase variation matrix K and protein production 
matrix B: 
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If we denote the identity matrix of size five by J5, then 

K-B 7/5 

B K-B-jJ 5 27/5 

1= B K — B — 27/5 37/5 



(4) 

After replication, as mentioned in Section II-C any configu- 
ration transforms into another configuration with half protein 
concentration. Hence, the replication matrix can be written 
as D = —I + D + . The negative identity matrix represents a 
continuous reduction in the probability of all configurations 
due to reduction in protein concentration. The D + matrix 
contains the information on phase change and is composed 
of the blocks 





0.5 
0.5 







(5) 



where the protein concentration of the ith five configurations 
is approximately half of that of the jth five configurations. 
Note that in computing the rates, one unit time is equal to 
one generation or the time between two replications. 

A. Fluorescence Grid Based Aggregation 

Aggregation or lumping of Markov chains has been known 
for a long time [10]. Here, we aggregate the states of Markov 
chain P(t) = AP(t) into groups of mj, m 2 , . . . states by a 
linear aggregation operator E: 



E = 



1...1 0. 



Therefore, the aggregated probability vector at time t, is 
equal to P agg (t) = EP(t). Taking the derivative of both 
sides gives P agg {t) = EP(t) = EAP(t). To find the 
dynamics of P agg (t) independent of P(t), we define P(t) 
as an approximate function of P agg {t). We assume that the 
probability of being in state i is equal to the aggregated 
probability of being in the group that contains i divided by 
the number of states in that group, that is, 



P(t) ~ FP agg (t), 
where F is the disaggregation operator and 



(6) 



F = 



1 

mi 



1 

m 1 







mi 



1 

m 2 



1 

m 2 



m 2 



Now, consider the following approximated aggregated 
Markov chain 



P a {0) = P agg (0) = EP(0), 
P a (t) = EAFP a (t). 



(7) 



Based on assumption (|§J, the evolution of the solution P a (t) 
can approximate the evolution of the aggregated probability 
vector P agg (t). 

Remark 1 (Properties of aggregation). First, our linear ag- 
gregation method is not lumpable [10], or unbiased [8]: a 
Markov chain is lumpable with respect to an aggregation 
if the transitions and states inside any partition group also 
compose a Markov chain. The class of Markov chains which 
admits this exact aggregation was investigated in [10] and 
proved to be quite narrow. In our aggregation method the nec- 
essary and sufficient condition for lumpability of the CME 
is CBAC = AC. It is easy to see that this equality does 
not always hold, and thus our aggregation is not lumpable. 
Second, our aggregation is regular [8]: an aggregation is 
regular if it is both linear and state partitioning. Being state 
partitioning means that the aggregator should assign each 
state of Markov process to be aggregated to exactly one super 
state. It can be easily checked that our aggregation is state 
partitioning. 

Before proceeding to the theorem, define fluorescence rate 
matrix Bf to be equal to a protein production matrix B 
whose entries (e.g., (3 on ) are replaced by the corresponding 
fluorescence based production rates (e.g., /3/,onX see Sec- 
tion [U^B] 

Theorem III.l (FGBA algorithm). Consider a gene-protein 
system that can be described by the CME P(t) — AP(t), 
where the transition matrix A is given by equation Q. From 
the experimental fluorescence histograms, extract the fluores- 
cence grids Ai, A2, . . . and the fluorescence rate matrix Bf. 
Then the solution to the following fluorescence based CME 
simulates the experimental fluorescence histogram: 

P f (t) = A f P f (t), 

I 7-> 



K-£B t 
X7 B f 



K 











K 



A 2 2 5 
A 2 J 5 





7(Ai+A 2 ) j 
A 3 15 
7(A!+A 2 ) T 





7(A!+A 2 +A 3 ) 



(8) 

where Pf(0) is computed based on the initial state of the 
system in the experiments. 



Remark 2. In Theorem III.l the phase variation matrix K 
can be any arbitrary matrix with zero column sum. 

Proof: [Proof of Theorem |III. 1| First, we aggregate 
the states of the initial CME P(t) = AP(t) by lumping 
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the configurations with different protein numbers but same 
phase, that is, the configurations along the x-axis of Figure [3] 
Therefore, in above mentioned aggregation (disaggregation) 
operator, each entry Eij (F L j) is replaced by a five by five 
block Eijls (Fijls), and we denote the new aggregation (dis- 
aggregation) operator by E (F). Employing these operators, 
the dynamics of the approximated aggregated CME will be: 

P a (0) = EP(0) 
P a (t)=AMt), 
A a = EAF 



K- —B 

mi 

^-B 



o 



K 



l 

1112 



^B 
B 



K 



m 2 ' J 







7(mi+TO 2 ) 

7(mi+m 2 ) 
m 3 



-B 



(9) 

In essence, the (5i)th entry of P a (t) represents the probabil- 
ity of having a protein concentration between mi+- ■ ■+mi-x 
and mi + • • • + m, proteins at time t. Notice that the group 
sizes rrii and the protein production rates in B are unknown. 
Now, assume that for i — 1,2,..., the each group size njj 
satisfies 

m, = maxjxg € N| fJ,x g < Aj}, (10) 

where x g is the protein number and p, is the fluorescence- 
GFP ratio, defined in Section |II-B| Roughly speaking, rrii 
is the number of proteins in one cell that increases the 
fluorescent intensity of the cell by Aj. Since for the exper- 
imental fluorescence grids in histograms of [11], mj's tend 
to be large, one can see that p/trii ~ Aj. Then, according to 
equation |2]), 

P* ^ Pf,*/l* _ Pf,* 
mi Ai/fi A, 

Moreover, for any k G {1, 2, . . . }, 

m, + TYij (A, + Aj)/[i Aj + Aj 
m fe A fe //i A fe 

Therefore, the fluorescence based CME |8| is a direct 
consequence of approximated aggregated CME |9| under 
assumption firrii = Aj, and the unknown values n and 
TOj's are eliminated. Note that the ith entry of Pf(t) is now 
the probability of cell having fluorescent intensity between 
AH h Aj_! and Ai H h Aj. ■ 

Proposition III.2 (Evolution of error in FGBA method). 

Consider the dynamics of a gene-protein system with only 
one gene phase, hence one protein production rate P, is 
described by the CME dotP(t) = AP(t). By employing the 
FGBA method, the system's dynamics can be approximated 
by the fluorescence based CME Pf(t) — AtPf(t). Assume 
that: 

1) there exists r € K>o such that the fluorescence grids 
satisfy Aj < rAj_x; and 



2) there exists e G M>o such that the group sizes satisfy 
l/xrrij -Ai| < e. 
Let e{t) denote the error in the expected value of the final 
probability distribution, that is, 

e(t)=pE[P(t)]-E[P f (t)}, 

then e(t) can be upper bounded by a well defined function of 
E[Pf(t)], e, r, and the minimum and maximum fluorescence 
grid. 

Proof: This error in FGBA method is caused by 
two reasons: aggregating the states of initial CME; and 
approximating the group sizes m. L by fluorescence grid sizes, 
instead of employing the exact equation ( fTO) . Therefore, 

e(t)=nE[P(t)]-E[P f (t)] 

= (pE[P(t)} - nE[P a (t)}) + (pE[P a (t)} - E[P f (t)]) 

= pei(t) + e 2 (f). 

We first compute the first term's upper bound: 

ei (t)=E[P(t)]-E[P a (t)] 

= [0 1 2 ... }P{t) - [0 mi mi + m 2 ... ]P a {t). 

Taking the derivative of both sides gives 

ei(t) = [0 1 ...]AP(t) 

-[0 mi TO1+TO2 ...]A a P a {t), 

where A a — EAF, and E and F are the aggregation and 



disaggregation operators introduced in Section III-A Hence 



ei(t) = [P P- 1 P-2 1 



m 2 m 3 
= pi T P{t) - 7(0 1 2 



Pit) 
m 2 ) 



]P(t) 



/31 T P a (t) + 7 [0 



mf m 2 (mi + m 2 ) 



}Pa(t) 



}Pa(t). 



Clearly, l T P a (t) = l T P(t) = 1. By adding and subtracting 
7ei(f) we have 



ei(t) = 7ei(t) 



-7[0 —-mi 
m 2 



m 2 (m\ + m 2 ) 



m 3 



-( mi +m 2 ) ...}P a (t). 



Integrating from to t gives 



ei(t) = ei*ei(0) 



e T(t-r) 7 



[0 



m\{m 2 — m 1 ) (mi + m 2 )(m 3 — m 2 ) 



}Pa{T)dT. 



m 2 m 3 
According to the initial value P a (Q) — EP(0) we have 

ei(0) = [012 ...]P(0)-[0 mi mi + m 2 . . .]EP(0) 
= P 2 (0) + --- + (mi-l)P mi (0) 

+P mi+ i(0) + ■ • • + (m 2 - l)P m2 (0) 
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Clearly, ei(0) is a convex combination of {1,2,..., m max }, 
where m max is the maximum group size. Hence, denoting 
the maximum fluorescence grid by A max , 



ei(0) < <(A max i 



Therefore, 



e T(*-r) 7 p TOl TOl+TO2 ...]P a ( r )dr. 



According to the assumptions [T] and [2] the 
value maxi{ " 1 '" m '~ 1 } can be upper bounded by 
1 — A m!n /(rA m j„ + re + e), which we denote by 
r: 

< ^ + V + f(l - e-^) 7 S[Pa(*)]. 

Second, we compute the upper bound on error e2(t): 

e 2 (t) = fi[0 mi mi+m 2 ...]P a (t) 

-[0 Ax A : + A 2 ...]P f {t) 
< [0 Ai Ai+A 2 ...}P a (t) + elP a (t) 

- [0 Ai Ai + A 2 ...]P/(i). 

The value elP a (t) is equal to e, and using assumption [2] 
one can compute the scalar function g(e, i) such that e Aat < 

g(e,t)e A f t , then 

P a (t) = e A ^P a (0) < g(e,t)e A ^P f (0) = g(e,t)P f (t). 
Consequently, 

e 2 (t)<(g(e,t)-l)E[P f (t)]+e, 

and finally 

/zei(i) + e2(t) < (A 

max ~r eye 

+ (f(l-e-^) 7 + 5 (e,i) +e)E[P f (t)}. 



IV. Numerical Results 

In the experiments done by Lim et al., they let six 
separate colonies of E. coli grow for 20 hours. Each colony 
started from a cell that contains a mutant of agn43 with Off 
expression state. The gene was mutated by deleting different 
parts of the upstream sequences of agn43. They claimed 
that the only difference in the dynamics of gene-protein 
system in these mutants is the ratio kn/k_R, see Figure [T] 
According to the steady state of phase varying dynamics, 
the ratio ka/k-n is equal to the fraction of unmethylated 
cells with Off expression, and is experimentally found to 
be 15.8,8.9,5.5,4.3, 1, and 0.1 for the six mutants. Finally, 
they measured the fluorescence of the cells in each colony 
with flow cytometer and drew fluorescence histograms, see 
Figure |4](a). In these histograms, the fluorescence grids A< 
are equal to 10 4 - 10 i_1 , where i e 0.05{0, 2, . . . , 40}. 

Now, to generate analytical fluorescence histograms, we 



gene-protein systems of the mutants of agn43 (each system 
has one of the six mentioned values for kii/k-R, and the 
rest of parameters remains constant.) Knowing the phase 



variation rates, Section II-A degradation and fluorescence 
increase rates, Section |II-B| and fluorescence grids, the 



fluorescence based transition matrix Af of equation ([8) can 
be computed. Therefore, 



P f (t)=A f P f (t)+D f P f (t), 



(11) 



where D 



f 



employ the FGBA method stated in Theorem III. 1 to the 



—I + Dj is the fluorescence based replication 
matrix. The Dj is composed of blocks Df-, given by equa- 
tion |5]), while the fluorescence of the ith five configurations 
is approximately half of the jth five configurations. 

Remark 3. In Lim's model, the replication of the gene 
is described by a discrete time reaction. Accordingly, we 
first employed a discrete time replication in our stochastic 
analysis. However, the variance of the resulting probability 
distribution did not match the variance of the experimental 
fluorescence histograms, see Figure [5] This disagreement 
can be explain as follows: The experimental histograms are 
taken from a colony of the cells, and in a colony not all the 
cells replicate at the same time. Therefore, a continuous time 
replication can better capture the behavior of a large number 
of cells than a discrete time replication. 

Equation ( fTTj ) is an infinite dimensional ODE, hence we 
truncate this equation into a finite dimensional equation. The 
finite dimensional CME should contain configurations whose 
protein concentration is between zero and the maximum 
number of proteins in one cell, or equivalently, configurations 
whose fluorescence is less than the maximum value observed 
(10 4 a.u.). The solutions to the final CME's for the six 
mentioned mutants are plotted in Figure |4](b). 

V. Conclusion and Future Work 

As our main result, we introduced a new approach to jus- 
tify the dynamical model of protein expression by the exper- 
imental fluorescence histograms. We described the dynamics 
of a gene-protein system, whose protein production rates are 
unknown, with a chemical master equation (CME). Based on 
the resolution of the experimental histograms, we aggregated 
the states of the CME, however, the number of states in 
each aggregated group is also unknown. We proved that the 
unknown protein production rates and number of states in 
one group can be replaced by the fluorescence increase rate 
and the fluorescence grids from the histograms, respectively. 
Therefore, the final probability distribution is the theoretical 
fluorescent histogram of the gene-protein model, and can be 
verified by the experimental fluorescence histograms. One 
future challenge is to compute the parameters of a gene- 
protein system via its fluorescence histograms. The solution 
to the CME, which is a probability distribution, has been 
numerically approximated from the parameters of the CME, 
see [18]. A reverse analysis of this method can help us find 
the parameters of a gene-protein system by experimental 
fluorescence histograms. 
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Fig. 4. (a) The fluorescence histograms for six separate colonies, each 
starting from a mutant of agn43 with Off expression state. Lim et al. 
claims that the only difference in the dynamics of these mutants is the 
ratio , R , which is equal to 15.8,8.9,5.5,4.3,1, and 0.1 from top to 

— R 

bottom. Reprinted figure with permission from [11]. ©2007, by Nature 
Publishing Group, (b) The probability distribution of fluorescent intensity 
resulting from solving the aggregated CME of equation J5J. Each plot is 
obtained by solving the model with one of the six mentioned values for 
I kft , and the rest of parameters remains constant. These plots proves that 

fc — R 

our method can make a phase varying model verifiable by the fluorescence 
histograms. 




12 3 4 

Total cell fluorescence (login a.u.) 



Fig. 5. (a) The experimental fluorescence histogram of a colony that starts 
from a mutant of agn43 with Off expression state and the ratio , fcfi = 1. 

fc — R 

Reprinted figure with permission from [11]. ©2007, by Nature Publishing 
Group, (b) The probability distribution of fluorescent intensity resulting 
from solving the fluorescence based CME with discrete time replications for 
the same mutant. We assumed that the fluorescence becomes half in each 
replication, (c) The probability distribution of fluorescent intensity resulting 
from solving the fluorescence based CME with discrete time replications for 
the same mutant. Here, the replication matrix is constructed by a binomial 
probability distribution, in order to increase the resulting variance. This 
figure tells us that a model with discrete time replication can not capture the 
variance of the experimental fluorescent intensity distributions. Moreover, 
employing a binomial probability distribution for replication only slightly 
increases this variance. 
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